Identifying Verbal Collocations in Wikipedia Articles
نویسندگان
چکیده
In this paper, we focus on various methods for detecting verbal collocations, i.e. verb-particle constructions and light verb constructions in Wikipedia articles. Our results suggest that for verb-particle constructions, POS-tagging and restriction on the particle seem to yield the best result whereas the combination of POS-tagging, syntactic information and restrictions on the nominal and verbal component have the most beneficial effect on identifying light verb constructions. The identification of multiword semantic units can be successfully exploited in several applications in the fields of machine translation or information extraction.
منابع مشابه
The effect of verbal and visuospatial working memory spans on collocation processing in learners of English
Much interest has recently been directed toward the knowledge of collocations in the field of second language learning since they have been asserted to improve fluency. The current study was intended to examine the effect of verbal and visuospatial working memory spans on the processing of collocations using a Self-Pace Reading Task (SPRT) and relevant working memory tasks. To this end, partici...
متن کاملConstructing a Collocation Learning System from the Wikipedia Corpus
The importance of collocations for success in language learning is widely recognized. Concordancers, originally designed for linguists, are among the most popular tools for students to obtain, organize, and study collocations derived from corpora. This paper describes the design and development of a collocation learning system that is built from Wikipedia text and provides language learners wit...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملDiscourse Connective - A Marker for Identifying Featured Articles in Biological Wikipedia
Wikipedia is a free-content Internet encyclopedia that can be edited by anyone who accesses it. As a result, Wikipedia contains both featured and non-featured articles. Featured articles are high-quality articles and nonfeatured articles are poor quality articles. Since there is an exponential growth of Wikipedia articles, the need to identify the featured Wikipedia articles has become indispen...
متن کاملThe Workshops of the Tenth International AAAI Conference on Web and Social Media
We investigate the automatic generation of Wikipedia articles as an alternative to its manual creation. We propose a framework for creating a Wikipedia article for a named entity which not only looks similar to other Wikipedia articles in its category but also aggregates the diverse aspects related to that named entity from the Web. In particular, a semi-supervised method is used for determinin...
متن کامل